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(57) La presente invention concerae des vecteurs « gene- 
trap » secretoires et les methodes pour utiliser ces 
vecteurs pour isoler des proteines extracellulaires et 
fabriquer des cellules et des organismes dotes de genes 
secretoires mutants. Les vecteurs codent un domaine 
transmembrane de type II et un marqueur indicateur 
sensible a la lumiere et, facultativement, un marqueur 
selectionnable et \m site accepteur d'exon ^piss6. La 
methode pour isoler le gene consiste a introduire de 
fa9on stable les vecteurs pieges secretoires dans un g^ne 
endogene, ce qui permet a la prot^ine de fusion resultante 
d'exprimer difftremment le marqueur indicateur selon 
que le gene endogene donne une sequence signal N- 
terminale. 



(57) The invention relates to secretory gene trap vectors 
and methods of using such vectors to isolate extracellular 
proteins and to make cells and organisms with mutant 
secretory genes. The vectors encode a type II 
transmembrane domain and a lumen -sensitive indicator 
marker and optionally, a selectable maricer and an exon- 
splice acceptor site. The gene isolation methods involve 
stably introducing the secretory trap vectors into an 
endogenous gene whereby the expression of the resultant 
fusion protein provides a differential expression of the 
indicator marker depending on whether the endogenous 
gene provides an N-terminal signal sequence. 
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NOVEL VECTORS AND USE THEREOF FOR CAPTURING TARGET GENES 

Inventor: William C. Skames 

This invention relates to novel vectors and use thereof for capturing target genes 
encoding membrane and secreted proteins. 

BACKGROUND 

Secreted proteins are generaUy bm jot exclus^ in that they 

contain an N-terminal extension (or "signal sequence") of roughly 18 to 25 hydrophobic 
amino acids. This signal sequence directs the translation product to the secretory pathway 
such that the polypeptide translocates across a cell membrane for export from the cell. 
For the most part, the signal sequence is proteolytically cleaved fironn the polypeptide 
during the secretion process whereby the final secreted product lacks this sequence. 
Secreted proteins in this class include, for instance, polypeptide hormones and cytokines. 

Membrane-spanning proteins generally contain, in addition to a signal sequence, 
one or more hydrophobic sequences, often of similar size, downstream of the signal 
sequence. The transmembrane domain prevents further translocation of the polypeptide, 
so resulting in the production of a protein that spans the membrane. Transmembrane 
proteins in this class are all oriented in a type I orientation where the N-terminus of the 
protein is oriented towards the outside of the cell and include, for instance, receptors for 
polypeptide hormones and receptors for cytokines. There also exist a minor class of 
membrane spanning proteins that lack an N-terminal signal sequence and these proteins 
may exist in either a type I or type II orientation (High, 1992). Type II membrane 
proteins are inserted in the membrane such that the N-terminus remains in the cytosol. 
The orientation that these proteins adopt is largely determined by the charge differential 
across the internal transmembrane domain (Hartman, Rappaport and Lodish, 1989). 

A variety of expression cloning strategies have been developed over the years to 
identify secreted and membrane spanning proteins (Simmons, 1993). These strategies lely 
on cloning random cDNAs into expression vectors and screening transfected cells for the 
appearance of antigenic determinants on the surface of cells. In a recent embodiment of 
this technique, the "signal sequence trap" (Tashiro et al., 1993), cDNA fragments 
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encoding an N-terminal signal sequence were identified by assaying for the appearance of 
a specific antigenic determinant on the surface of transiently transfected cells. Expression 
cloning systems to identify new secreted or membrane-spanning proteins are technically 
demanding and generally favour the detection of abundant mRNA species. Moreover, 
the function and expression profile of genes isolated by these methods cannot be 
ascertained without considerable additional effort. 

'*Gene trapping" has been developed to generate random insertional mutations in 
genes of eukaryotic cells (Gossler et al. 1989, Brenner et al. 1989, Kerr et al. 1989 and 
Friedrich and Soriano, 1991). Typically such vectors possess structural components 
which facilitate isolation of vector sequences inserted into transcription units so as to form 
recombinant sequences and other elements which allow the resulting recombinant 
sequences to be identified and/or characterised. Thus for example, the known vectors 
may contains sequence which are commonly associated with eukaryotic structural genes 
such as for example, splice acceptor sites which occur at the 5' end of all exons and 
polyadenylation sites which normally follow the final exon. If the vector inserts within an 
intron in the correct orientation the splice acceptor and polyadenylation sites are utilized 
to generate a fusion RNA transcript that contains a portion of the target gene spliced to 
reporter gene sequences of the vector. Other vectors with similar function do not rely on 
splicing, but instead recombine within coding sequences of the target gene simply as a 
result of random recombinational events. 

Each insertion event that activates expression of the reporter gene theoretically 
represents insertions that disrupt the nonmal coding sequences of the target gene to create 
a mutation. Furthermore, expression of the reporter gene is under the regulatory control 
of the target gene and thus reporter gene expression should reflect the expression pattern 
of the target gene (Skames et al. 1992). Lastly, a portion of the target gene contained in 
the RNA fusion transcript may be cloned directly from the fusion transcript or from 
genomic DNA upstream of the site of insertion (Skames et al. 1992, von Melchner et al. 
1992, Chen et al. 1994, DeGregori et al. 1994). 

Gene trapping in mouse embryonic stem (ES) cells has hitherto offered a rapid, 
but essentially random method to identify and simultaneously mutate genes expressed 
during mouse development (Skames, 1990). There is, however, a need to identify and/or 
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Isolate target eukaryotlc genes on the basis of various 
selection criteria. A particular class of genes of Interest 
are those that encode secreted and membrane-spann±ng proteins. 

Gene trap vectors typically contain the /3- 
galactosidase reporter gene, /3-galactosldase (jB-Qal) Is a 
cytosolic enzyme that lacks a signal sequence and 
transmembrane domain. This reporter has been a particularly 
useful tool for the expression of gene fusions In bacteria due 
to" tfte fact~~3-gar~can~^c^^ fusions 
without affecting Its enzyme activity (Casadaban, Chou & 
Cohen, 1980). Fusions containing all or portions of secreteci 
molecules have been used to define the requirement for the N- 
termlnal signal sequence to Initiate secretion (Benson, Hall & 
Sllhavy, 1985; Silhavy & Beckwlth, 1985). Howeveir, these 
fusions fail to be exported from the cell, suggesting that fi- 
gal is not able to cross bacterial membranes. Similarly, tHe 
jB-gal reporter appears to be Incompatible with secretion 
pathways in eukaryotlc cells. In yeast, i3-gal fusions 
containing the signal sequence of the Invertase enzyme 
associate with the membrane fraction of the BR but fall to 
traverse further along the secretory pathway (Emr et al., 
1984). In these examples, jS-gal activity is preserved. In 
-cont rast-, -GaenorhabdltHls-elegans-,— j3-ga-l- aet-l-vit y -i s- lost- -In ~a - 
fusion that contained the N-terminal signal sequence of a 
secreteci laminln (Plre, Harrison and Dixon, 1990). IncludinQ 
a predicted type I (Hart man et al. 1989) t ransmemt>rane domain 
between the signal sequence and j3-gal, restored enzymatic 

- 3 - 
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activity to the fusion protein presumably by keeping p-gal In 
the cytosol. 

We have now developed a strategy which solves the 
problems outlined above and which In Its more specific aspects 
in based on gene trap protocols, A modified gene trap vector, 
the secretory trap, was engineered such that the activity of 
the /3-gal reporter gene is dependent on the acquisition of a 
signal sequence from the endogenous gene at the site of vector 
insertion. Fusions that do not contain a signal sequence and 
fail to activate reporters cannot be ascertained without 
considerable additional effort. 
SUMMARY OF THE INVENTION 

Methods and compositions for detecting and/or 
isolating targeted genes are provided. 

In one aspect, the invention provides a method for 
isolating a target eukaryotic gene encoding an ext racelluar 
protein, said method comprising steps: 

(1) introducing into a plurality of cells a vector 
encoding a type II transmembrane domain and a lumen -sensitive 
Indicator marker which is preferentially detectable when not 
present in a secretory lumen of the cells, wherein said 
indicator marker Is oriented 3' relative to said t ype II 
transmembrane domain, whereby said vector stably Integrates 
into the genomes of said plurality of cells to form a 
plurality of transgenic cells, wherein in at least one cell of 
said plurality of cells, said vector stably integrates Into a 
gene encoding an extracellular protein having an N— terminal 
signal sequence; 

- 3a - 
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(2) incubating said plurality of cells under conditions 
wherein said indicator marker is expressed in a preferentially 
active form as a fusion protein with an N-termlnal region of 
said extracellular protein in said cell or a descendent of 
said cell, and is unexpressed or expressed in a preferential ly 
inactive form in said plurality of cells not expressing said 
indicator marker as a fusion protein with an N-teirminal region 
of an extracellular protein having an N-terminal signal 
sequence? 

(3) detecting the expression of active Indicator marker 
at said cell or a descendent of said cell; 

(4) isolating from said cell or a descendent of said cell 
a nucleic acid encoding least an N-terminal region of said 
extracellular protein. 

In another aspect, the invention provides a method 
for isolating a target eukaryotic gene encoding an 
extracellular protein, the method comprising the steps i 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding a first fusion protein comE>rlslng a 
secretory lumen-sensitive Indicator marker and a t :ype II 
transmembrane domain positioned N-terminally of the marker, 
wherein upon transfer into the cell, the DNA sequence stably 
Integrates into a gene encoding an extracellular protein 
having an N-termlnal signal sequence; 

(t») incubating the cell in vitro under conditions wherein 
the Incaicator marker is expressed by the cell or descendent of 
the cell in a preferentially active form as a second fusion 

- 3b - 
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protein with an N-termlnal region of the ext racell-ular 
protein; 

(c) detecting the presence of the preferentially active 
form of the Indicator marker, wherein the presence of the 
preferentially active form of the Indicator marker" Indicates 
that the gene encodes an extracellular protein; 

(d) Isolating the gene encoding the extracellular 
protein. 

In a further aspect, the Invention provides a method 
for making a transgenic cell comprising a mutation In a gene 
encoding an extracellular protein, said method comprising 
steps: 

(1) Introducing Into a plurality of cells a v^ector 
encoding a type II transmembrane domain and a lumen- sensitive 
Indicator marker (same as claim 1), wherein said indicator 
marker Is oriented 3' relative to said type II transmembrane 
domain, whereby said vector stably Integrates into the genomes 
of said plurality of cells to form a plurality of transgenic 
cells, wherein in at least one cell of said plurality of 
cells, said vector stably integrates into a gene encoding an 
extracellular protein having an N-termlnal signal sequence; 

(2) Incubating said plurality of cells or descendents of 
said plurality of cells under conditions wherein sGld 
indicator marker is expressed in a preferentially active form 
as a fusion protein with an N-termlnal region of said 
extracellular protein in said cell or a descendent of said 
cell, and is unexpressed or expressed in a prefer ntlally 
Inactive form in said plurality of cells not expressing said 

- 3c - 
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indicator marker as a fusion protein with an N-ter^minal region 
of an extracellular protein having an N-termlnal signal 
sequence; 

(3) detecting the expression of active indicator marker 
at said cell or a descendent of said cell; wherela said cell 
is a transgenic cell comprising a mutation in a gene encoding 
an extracellular protein. 

In yet another aspect, the invention provides a 
method for making a transgenic cell comprising a laxitation in a 
gene encoding an extracellular protein, said methocJ comprising 
steps: 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding a first fusion protein conii)rlsing a 
secretory lumen- sensitive indicator marker and a type II 
transmembrane domain positioned N-terminally of the marker, 
whereby upon transfer into the cell, the DNA sequence stably 
integrates into a gene encoding an extracellular protein 
having an N-terminal signal sequence; 

(b) incubating the cell in vitro under condlt rlons wherein 
said indicator marker is expressed by the cell or descendent 

of the cell in a preferentially active form as a second fusion 
protein with an N-terminal region of the extracellular protein 

(c) detecting l;he expression of the preferentially active 
form of the indicator marker, wherein the expression of the 
preferentially active form of the indicator marker indicates 
the presence of the second fusion protein, and the presence of 
the second fusion protein indicates that the cell As a 

- 3d - 
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transgenic cell comprising a mutation in a gene eracodlng an 
extracellular protein. 

In yet another aspect, the invention provides a 
vector comprising a DNA sequence encoding a first fusion 
protein comprising a secretory lumen-sensitive inca.icator 
marker and a type II transmembrane domain positioned N- 
terminally of the marker, wherein upon transfer into a cell 
and stable Integration of the DNA sequence into a qene 
encoding an extracellular protein having an N-termlnal signal 
sequence, the marker is expressed in an active for^m as a 
second fusion protein with an N-terminal region of the 
extracellular protein. 

In yet another aspect, the invention provides an 
animal cell comprising a vector according to claim 11, wherein 
the DNA sequence Is stably integrated into a gene of the cell 
encoding an extracellular protein having an N-termlnal signal 
sequence, and the marker is expressed in an active form as a 
second fusion protein with an N-terminal region of the 
extracellular protein. 

In yet another aspect, the invention provides a 
vector comprising a DNA sequence encoding a reporter gene and 
a type II transmembrane domain positioned N-termlnally to the 
reporter gene, wherein upon transfer into a cell and stable 
integration of the DNA sequence into a target gene, expression 
of the reporter gene is detectable if the target gene encodes 
a secreted or membrane- spanning protein having a signal 
sequence and is undetectable if the target gene codes for a 

- 3e - 
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non-secreted, non-membrane-spannlng protein not having a 
signal sequence. 

In yet another aspect, the present invention 
provides a method of detecting a target eukaryotlo gene, the 
method comprising the steps: 

(a) Introducing Into a cell in vitro a vectorr comprising 
a DNA sequence encoding a reporter gene and a type II 
transmembrane domain positioned N-terminally to the reporter 
gen<r, wher^elh upbn transfer into ThV celT," the~DNA sequence" 
stably integrates into a target gene? 

(b) Incubating the cell in vitro under conditions wherein 
the reporter gene is expressed by the cells or descendant of 
the cell; 

(c) determining that the target gene encodes a secreted 
or membrane- spanning protein having a signal sequence by the 
detection of reporter gene expression or determining that the 
target gene encodes a non- secreted, no-membrane spanning 
protein not having a signal sequence by the undetectable 
expression of the reporter gene; and 

(d) identifying the target gene. 

In yet another aspect, the present Inven-tlon 
provides a method of detecting a target eukaryotlc gene, the 

met hod -c ompr^ls ing -t he -st eps-: 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding ^-galactosidase and a type II 
transmembrane domain positioned N-termlnally to the /3- 
galact osldase coding region of the DNA sequence, wl^ereln upon 

- 3f - 
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transfer Into the cell, the DNA sequence stably integrates 
Into a target gene? 

(b) incubating the cell in vitro under conditions wherein 
the ^-galactosldase Is expressed by the cells or ciescendent of 
the cell, 

(c) determining that the target gene encodes a secreted 
or membrane-spanning protein having a signal sequence by the 
detection of )3-galactosidase expression or determining that 
the target gene encodes a non-secreted, non-membrane spanning 
protein not having a signal sequence by the undetectable 0- 
galactosidase expression; and 

(d) identifying the target gene. 

According to a first aspect of the present invention, there 
are provided vectors comprising a component which upon 
insertion into a target eukaryotic gene produces a 
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modified gene which on expression codes for a polypeptide having a first portion of its 
amino acid sequence encoded by a nucleic acid sequence of the eukaryotic gene and a 
second portion of its amino acid sequence encoded by a nucleic acid sequence of the 
vector, characterised in that the vector mcludes a sequence which confers on the 
polypeptide a property which is differentially associated with the presence in eukaiyotic 
gene of a nucleic acid sequence coding for an amino acid sequence which results in the 
said polypeptide being located in a predetermined spatial relationship with structural 
components of the host cell. 

A particularly useful class of vectors according to the invention are ones wherein 
- the vector includes one or more sequences which^onfer or confers^onl:He polyj^ptide a 
property which is diflferentially associated with the presence in eukaiyotic gene of a signal 
sequence associated with a secreted or membrane-spanning protein. By being 
"differentially associated" with the presence in the target eukaryotic gene of a nucleic acid 
sequence coding for an amino acid sequence which results in the chimeric polypeptide 
being located in a predetermined spatial relationship with structural components of the 
host cell, the presence or absence of the differentially associated property allows vectors 
of the invention to distinguish between (1) target eukaryotic genes possessing a nucleic 
acid sequence coding for an amino acid sequence which results in the chimeric polypeptide 
being located in a predetermined spatial relationship with structural components of the 
host cell and (2) target eukaryotic gene which are devoid of a nucleic acid sequence 
coding for an amino acid sequence which results in the chimeric polypeptide being located 
in a predetermined spatial relationship with structural components of the host cell. In 
preferred embodiments of the invention, the vectors of the invention can distinguish 
betu^een target eukaryotic genes which (1) code for proteins which possess a signal 
sequence, e.g. secreted proteins and ones which (2) code for proteins which do not 

possess-a-signal-sequence,-e.g.-non-seGreted-proteins- 

The aforementioned "conferring" sequences can, for example comprise at least a 
portion of a membrane-associated protein. In this embodiment, the protein product 
encoded by a reporter gene element of the vector can be forced to adopt, on integration 
into a target gene, one of two configurations, depending on whether ornot gene includes 
a signal sequence associated with a secreted or membrane-spanning protein. Thus for 
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txampk, if gene does code for a secreted or membrane-spanning protein the membrane- 
associated protein element of the vector sequence can cause a reporter gene product to 
adopt a configuration in relation to cell components such that the reporter gene product 
is activated and produces a detectable signal. Alternatively if the vector is incorporated 
into target gene which does not code for a secreted or membrane-spanning protein, the 
membrane-associated protein element of the vector sequence will cause reporter gene 
product to adopt a configuration in relation to cell conponents such that the reporter gene 
product is not activated and consequently will not produce a detectable signal. 

The membrane protein associated protein element of the vector sequence is 
preferably a type II transmembrane domain, i.e. a domain which includes a membrane- 
spanning sequence and any necessary flanking sequences (see below). Thus preferred 
vectors include a reporter gene and a sequence encoding a type n transmembrane domain, 
preferably placed N-terminally to the reporter, each mutually arranged so that on 
expression of the modified gene, detection of reporter polypeptide is dei>endent upon the 
eukaryotic gene coding for a secreted protein having a signal sequence, and the reporter 
is substantially undetectable if the eukaryotic gene codes for a non-secreted protein. The 
reporter generally provides a characteristic phenotype, e.g. an enzymic activity such as B- 
galactosidase activity. 

The vectors of the invention preferably include a nucleic acid sequence which 
facilitates insertion of said component into eukaryotic gene. These sequences may for 
example be (a) sequences associated with elimination of intron sequences from mRNA, 
such as, for example splice acceptor sequences, or (b) polyadenylation signal sequences. 
Alternatively, the vector may lack a splice acceptor sequence and thus rely on insertions 
directly into the coding sequences of genes. 

The subject vectors may also include an element allowing selection and/or 
identification of cells transformed as a result of components of the vector having been 
inserted into the target eukaryotic gene. Such a selectable element conveys a second 
property on transformed cells which may be mdependent of the differentially associated 
property; for example, a property allowing selection of cells wherein components of the 
vector have been inserted into the target eukaryotic gene as a result of confening 
antibiotic resistance or the ability to survive and/or multiply on a defined medium. 
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Examples of such marker sequences are ones which result in transformed cells being 
resistant to an antibiotic, for example G418, or having a varied degree of dependence on 
a growth factor or nutrient. Vectors possessing sequences which resixlt in the chimeric 
polypeptide possessing both the differentially associated property and a distinct selectable 
property are especially preferred, though the two properties can result from the same 
element (e.g. a selectable marker which is differentially active according to a 
predetermined spatial relationship with structural components of the cell). An example 
of the former vectors are ones possessing sequences conferring both 6-galactosidase (6- 
gal) and neomycin (neo) phosphotransferase activities on the chimeric protein. The 
construcr Bgeo combining " sequences conferring" both" 6-galactosidase (fi-gaT) and 
neomycin (neo) phosphotransferase activities in a single construct is particularly preferred. 

As discussed, the selective inactivation of the differentially associated property in 
chimeric polypeptides that do not acquure a signal sequence of an endogenous gene 
depends on the insertion of the chimeric polypeptide in a type II orientation in the 
membrane of the ER. Suitable type n transmembrane domains are preferably identified 
empirically, as described below. In addition, the orientation of proteins that contain 
internal transmembrane domains (signal anchor sequences) but no signal sequence may 
frequently be predicted from the number of positive charged amino acids within 15 amino 
acids either side of the transmembrane domain (Hartmann, Rappaport & Lodish, 1989). 
However, proteins with a predicted type I orientation may be forced into a type n 
orientation if the N-terminus contains many positively charged amino acids. Such 
orientation dispositive flanking sequences are readily identified, as shown with CD4, 
below. In these cases, it is necessary to retain these dispositive flanking sequences to 
preserve the type II character of the domain. Using these guidelines, transmembrane 
domains from any of the known type II proteins may be selected for designing new 
— seeretory-trap-vectors; suitable~transmembrane domains~include~those from type"ir ~ 
proteins listed by Hartmann, Rappaport & Lodish (1989), for examples, transmembrane 
domains of human P-glycoprotein, of human transferrin receptor, or of rat Golgi 
sialyltransferase. Alternatively, synthetic or hybrid type II transmembrane domains may 
be used. 



6 Univ of Edinburgh/Skames/EXELOOl 



2166850 



The invention further provides a method of detecting and/or isolating a target 
eukaryotic gene encoding a protein which is located in a predetermined spatial relationship 
with structural components of the host cell which comprises transforming a cell utilising 
a vector as defined above and detecting the transformed cell by assaying for said property 
which is differentially associated with the presence in the target eukaryotic gene of a 
nucleic acid sequence coding for an amino acid sequence which results in said polypeptide 
being located in a predetermined spatial relationship with structural components of the 
host cell. In a preferred embodiment, the method is a method for isolating a target 
eukaryotic gene encoding an extracellular protein, comprising steps: (I )introducing into 
a plurality of cells a vector encoding a type II transmembrane domain and a lumen- 
sensitive indicator marker, wherein said indicator marker is oriented 3' relative to said type 
n transmembrane domain, whereby said vector stably integrates into the genomes of said 
plurality of cells to form a plurality of transgenic cells, wherein at least cell of said plurality 
of cells, said vector stable integrates into a gene encoding an extracellular protein having 
an N-terminal signal sequence; (2) incubating said plurality of cells under conditions 
wherein said indicator marker is expressed in a preferentially active form as a fusion 
protein with an N-terminal region of said extracellular protein in said cell or a descendent 
of said cell, and is unexpressed or expressed in a preferentially inactive form in said 
plurality of cells not expressing said indicator marker as a fusion protein with an N- 
terminal region of an extracellular protein having an N-terminal signal sequence; (3) 
detecting the expression of active indicator marker at said cell or a descendent of said cell; 
and, (4) isolating from said cell or a descendent of said cell a nucleic acid encoding least 
an N-terminal region of said extracellular protein. 

The vectors of these methods comprise a lumen-sensitive marker, i.e. a marker 
which is preferentially detectable when not present in a secretory lumen of a cell, e.g. the 
ER, Golgi, secretory vesicles, etc. For example, the marker may be an enzyme such as 
galactosidase which is preferentially inactivated in the limien. An equivalent way of 
practising the invention is to use a marker which is preferentially detectable when present 
in the lumen. The important feature is that the marker is differentially detectable 
depending upon if it is present in or outside the lumen. When the marker is preferentially 
detectable outside the lumen, the vectors also comprise a type U transmembrane domain 
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as described above. In addition, the vectors may comprise a selectable marker which may 
be the same or different from the lumen-sensitive marker, as described above. 

The vectors may be introduced into the cells by any convenient means. For 
exan5)le, with cells in culture, conventional techniques such transfectioa (e.g. lipofection, 
precipitation, electroporation, etc.), microinjection, etc. may be used. For cells within an 
organism, introduction may be mediated by virus, liposome, or any other convenient 
technique, 

A wide variety of cells may be targeted by the subject secretory trap vectors, 
including stem cells, pluripotent cells such as zygotes, embryos, ES cells, other stem cells 
- such al lymphoid and myeloid stein cells7 neurfl stem^effsrtransfbnried ceUs such as 
tumour cells, infected cells, differentiated cells, etc. The cells may be targeted in culture 
or in vivo. 

The vector stably integrates into the genome (i.e. chromatin) ojF the target cells. 
Typically, the vector integrates randomly into the genome of a plurality of the cells, 
though in at least one of the cells, the vector integrates into a gene encoding an 
endogenous extracellular (i.e. secreted or transmembrane protein) having an N-terminal 
signal sequence such that the signal sequence is oriented 5' to the vector/insert. Such cell 
acquires then a mutated allele of the extracelltdar gene comprising at least a portion of the 
subject vector encoding the lumen sensitive marker and the type II transmembrane 
domain. 

The cells conprising the stably introduced vector are incubated under conditions 
whereby the lumen sensitive marker is expressed as in a preferentially detectable form as 
a fusion protein with an N-terminal region of the extracellular protein, i.e. the fusion 
protein is preferentially detectable via the marker if the endogenous protein portion 
includes a functional signal sequence. The incubation conditions are largely detennined 
—by the -cell-type-and-may-include mitotic-growth-and differentiation-of the-originally~ " 
transfected cells. 

The marker in preferentially detectable form may be detected in any convenient 
way. Frequently, the preferential detectability is provided by a change in a marker signal 
form or intensity such as a color or optical density change. Cells preferentially expressing 
such a signal presumptively comprise a fusion protein comprising an endogenous signal 
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sequence. The nucleic acid encoding such endogenous signal sequence is then isolated 
from the cell by conventional methods, typically by cloning the mutant genomic allele or 
a transcr5)t thereof. In this way, genes encoding known and novel extracellular proteins 
are obtained. In addition, the subject methods may be modified to obtain a products such 
as transgenic animals, cell Unes, recombinant secretory proteins, etc., some example of 
which are described below. 

D ESCRIPTION OF FTfilTRKS 

FIG. la shows the vectors designed to express CD4/6geo fusion proteins and 
summarises the results of transient transfection experiments. 

FIG. lb shows the design of gene trap vectors (pSABgeo and pGTl.Sgeo) and 
the secretory trap vector (pGTl.STM) and their relative efficiency in stable transfections 
of ES cells. pSABgeo contains the minimal adenovirus type 2 major late splice acceptor 
(SA; open box, intron; shaded box, exon) and the bovine growth hormone polyadenylation 
signal. The mutation in neo (*) present in pSABgeo was corrected by replacement of the 
Clal (Q/Sph I (S) fragment of Bgeo. pGT1.8geo and pGTl.STM contain the mouse En-2 
splice acceptor (Gossler et al, 1989) and SV40 polyadenylation signal but lack a 
translation initiation signal (ATG). The secretory trap vector pGTl.STM includes the 0.7 
kb Pstl/Ndel fragment of CD4 containing the transmembrane domain inserted in-frame 
with Bgeo in pGTl.Sgeo. 

FIG. Ic depicts our model for the selection activation of Bgal in the secretory 
trap vector. 

FIG. Id shows the relative eflBciency of secretory trap vectors designed to capnire 
each of the three reading frames and the exon trap design, vectors in each reading frame 
(pGTltm to 3tm) were constructed by ExoIII deletion of all but 30 bp of £n-2 exon 
sequences followed by the insertion of Bgl n linkers. The exon trap vector (pETtm) was 
made by removing the En-l splice acceptor from pGTl.STM. 

DESf ^RIPTlON OF SPECIFIC EMBODIMENTS 
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The following experiments and exanples are offered by way of illustration and not 
by way of limitation. 

Modified gene trap vectors (which we have termed "secretory trap") were 
developed which rely on capturing the N-terminal signal sequence of an endogenous gene 
to generate an active 6-galactosidase fusion protein. Using the prototype vector 
pGTl.STM (FIG. lb), insertions were found in the extracellular domains of a novel 
cadherin, an unc6- related laminin, the sek receptor tyrosine kinase and two receptor- 
linked protein tyrosine phosphatases, LAR and PTPk, thus confinning the selective 
property of the secretory trap vector to detect insertional mutations in genes encoding 
~ ^^sniembrane andlecreted protein products^ ~ 

The secretory trap strategy was developed starting from lacZ-based gene trap 
vectors (Gossler et al, 1989; Brenner et al, 1989; Kerr et al, 1989; Friedrich & Soriano, 
1991; Skames, Auerbach & Joyner, 1992). These vectors can create N-terminal 6- 
galactosidase (Bgal) fusion products which localise to different compartments of the cell, 
presumably reflecting the acquisition of endogenous protein sequences that act as sorting 
signals (Skames et aL, 1992; Bums et aL 1994). 

Here, we have exploited the differential sorting of P-gal fusion proteins as a means 
to capture genes encoding N-terminal signal sequences, genes therefore likely to be 
expressed on the cell surface. 
Materials and Methods 

Vectors. The Bgeo reporter was obtained by replacing the Clal (unique in 
Iac2^/Sphl (unique in neo) fragment of the gene trap vector pGTl.8 with the Clal/SphI 
fragment of pSABgeo (Friedrich & Soriano, 1991). pGTL8 is a derivative of pGT4,5 
(Gossler et al, 1989) where the 3' En2 sequences were replaced with the 0.2 kb 
BclI/BamHI SV40 polyA signal. The parental vector pActBgeo contains the 0.5 kb 
- human-6-actin-promoter-('Joyner,- Skames-& Rossant,-1989) linked to the 6geo/SV4G - 
polyA cassette. The start of Bgeo translation was engineered to contain a Kozak 
consensus sequence with unique Sail and Nrul sites on either side for generating 
subsequent fusions. Sail sites were placed at each end of a Ball fragment containing the 
entire coding region of the rat CD4 cDNA (Clark et aL, 1987) A 0.45 kb Sall/Kpnl 
fragment containing the N-terminus of CD4 or a 1.4 kb Sall/Ndel fragment containing the 
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entire CD4 coding region was cloned into Sall/Nrul digested pActBgeo to generate 
pActSSBgeo and pActSSTMBgeo, respectively. The secretory trap vector pGTl.STM 
includes the 0.7 kb Pstl/Ndel fragment of CD4 containing the transmembrane domain 
(TM) inserted in-frame with Bgeo in pGTl .8geo. 

ES ceU culture. CGR8 ES cells (a feeder-independent cell line derived from strain 
129/Ola mice by J. Nicholas . (Mountford et al. 1994) were maintained in Glasgow 
MEM/BHK12 medium containing 0.23% sodium bicarbonate, IX MEM essential amino 
acids, 2 mM glutamine, 1 mM pyruvate, 50 jiM 6-mercaptoethanol, 10% feotal calf serum 
(Globepharm), and 100 units/ml DIA/LJF. Transiently transfected cells were obtained by 
electroporating 10^ ES cells with 100 |jg plasmid DNA in a volume of 0.8 ml PBS using 
a BioRAd Gene Pulser set at 250 |iF/250 V and cultured for 36 hours on gelatized 
coverskips prior to analysis. To obtain stable cell lines, between 5 x 10^ to 10^ G3« 
ES cells were electroporated (3 pF/SOOV) with 150 \xg linearised plasmid DNA, 5 x lO^ 
cells were plated on 10 cm dishes and colonies were selected in 200 ng/ml Geneticin 
(GibCo). To assay Bgal enzyme activity and protein, ES cells were grown on gelatinized 
coverslips and stained with X-gal or with polyclonal rabbit a-6gal antiserum and FTTC- 
conjugated donkey a-rabbit IgG (Jackson ImmunoResearch). To permeabilize 
membranes, cells were treated with 0.5% NP-40 prior to antibody staining. 
RNA analysis and RACE cloning. Northem blots and RACE were carried out as 
previously described (Skames et al 1992). Several modifications were incorporated into 
the 5' RACE procedure used previously {Skames, Auerbach & Joyner, 1992): 1) 
microdialysis (0.025 micron filters, Millipore) was used in place of ethanol precipitations, 
2) nested PCR (30 cycles each) was carried out using an anchor primer and a primer 
specific to CD4 followed by size selection on agarose gels and a second round of PCR 
with the anchor and the 256 primer and 3) chromospin 400 columns (Clontech) were 
used to size select Xbal/Kpn-digested PCR products prior to cloning. 
Results 

To test if Bgal fusions that contain an N-terminal signal sequence could be 
identified by their subcellular distribution, vectors were constructed to express portions 
of the CD4 type I membrane protein (Qark et al, 1987) fused to Bgeo, a chimeric protein 
that possesses both Bgal and neomycin phosphotransferase activities (FriecJrich & Soriano, 
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1991) (Fig 1 a). Bgeo fused to the signal sequence of CD4 (pActSSftgeo) accumulated 
in the endoplasmic reticulum (ER) but lacked Bgal activity. Therefore, translocation of 
Bgeo into the lumen of the ER appeared to abolish Bgal enzyme fimction. Bgal activity 
was restored by including the transmembrane domain of CD4 (pActSSTMSgeo), 
presumably by keeping Bgal in the cytosol. Active protein v^as associated with the ER and 
in multq)le cytoplasmic inclusions, a pattern only rarely observed in ES colonies obtained 
with the conventional gene trap vector probably because insertions da wnstream of both 
a signal sequence and transmembrane domain of genes encoding membrane spanning 
proteins are infrequent. Therefore, to identify insertions in both secreted and type I 
— membrane proteins our gene trap vector -pGTl.Sgeo was modified to include the- 
transmembrane domain of CD4 upstream of Bgeo (Fig. lb). Vectors were linearised prior 
to electroporation at either the Sea I (Sc) site in the plasmid backbone (represented by the 
line) of pSABgeo or at the Hind m (H) site at the 5' end of the En-2 intron. The number 
of G418-resistant colonies obtained in two electroporation experiments (Expt 1: SxlO*^ 
cells; Expt 2: 10* cells) and the proportion that express detectable Bgal activity is indicated 
on the right. With the secretory trap vector pGTl .8TM Bgal enzyme activity is restored 
to any insertion occurring downstream of a signal sequence. 

In a pilot experiment, the relative eflBciency of our gene trap vector was compared 
to the original pSABgeo following electroporation into ES cells. Although pSAfigeo 
contains a start of translation which is absent in our vectors, fewer G418-resistant colonies 
were obtained with pSABgeo than with pGTl.Sgeo. More importantly, nearly all the 
colonies derived with pS ABgeo showed high levels of Bgal activity, wtiereas our vector 
showed a broad range of staining intensities and a greater proportion of Bgal negative 
colonies. Sequence analysis of the pSABgeo vector revealed a point mutation in neo 
known to reduce its enzyme activity (Yenofsky, Fine & Pellow, 1991), Therefore, the 
pSAfige o ve ctor .appears, to pre-select for genes_expressed athigh levcls_and con:eqtion„ 
of the neo mutation in our vectors now allows us to access genes expressed at low levels 
(see below). 

Approximately half of the pGTl.Sgeo colonies express detectable Bgal activity and 
show various subcellular patterns of Bgal staining observed previously. Xn contrast, only 
20% of the pGTl.STM colonies express Bgal activity and all display the "secretory" 



12 Univ of Edinburgh/Skames/EXELOOl 



2166850 



pattern of Bgeo activity characteristic of the pActSSTMBgeo fusion. Stable cell lines 
transfected with pGTl.STM in most cases showed detectable Bgal activity in 
undifferentiated ES cells however, we occasionaUy found ES cell lines that exhibited 
detectable Bgal activity only in a subset of differentiated cell types. The reduction in the 
proportion of Bgal-positive colonies and the singular pattern of Bgal staining observed 
with the secretory trap vector suggested that Bgal activity is retamed only in fusions that 
contain an N-terminal signal sequence and that Bgal activity, but not neo activity, is lost 
in fusions with proteins that do not possess a signal sequence. 

Our data indicate that in the absence of cleavable N-terminal signal sequence, the 
fusion protein behaves as a type II membrane protein (High, 1992), placing Bgeo in the 
ER lumen where the Bgal enzyme is inactive (FIG. 1 c). To confirm ttiis, several Bgal- 
negative cell lines were isolated and analysed by immunofluorescence, Bgal-negative cells 
lines were identified from immunodotblots of whole ceU lysates using a-Bgal antibodies 
and the ECL detection system (Amersham). From a screen of 48 colonies, three Bgal- 
negative cell lines were recovered and analysed by inrniunofluorescence. In these hnes, 
the fusion protein was detected on the surface of cells in the absence of detergent 
permeabilization, indicating a type II orientation of the Bgeo fusion protein. In contiast, 
detergent permeabilization was essential to detect the fusion protein in Bgal-positive cell 
lines, as would be expected for type I membrane proteins. 

A model for the observed selective activation of Bgal in the secretory trap vector 
is presented in FIG. 1 c. Insertion of pGTLSTM (hatched box) in genes that contain a 
signal sequence produce fusion proteins that are inserted in the membrane of the 
endoplasmic reticulum in a type I configuratioa The transmembrane domain of the vector 
retains Bgal in the cytosol where it remains active. Insertion of the vector in genes that 
lack a signal sequence produce fusion proteins with an internal TM domain. In these 
fusions, the transmembrane domain acts as a signal anchor sequence (High, 1 992) to place 
Bgeo in a type n orientation, exposing Bgeo to the lumen of the ER where Bgal activity 
is lost . This dependence of enzyme activity on acquiring an endogenous signal sequence 
provides a simph screen for insertions into genes that encode N-terminal signal sequences. 
Further proof for this model has come from cloning several genes associated with several 
secretory trap insertions. 
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5' RACE (rapid amplification of cDNA ends) was used to clone a portion of the 
endogenous gene associated with secretory trap insertions that express detectable Bgal 
activity (Table 1). Northern and RNA dot blot analysis showed that approximately one 
half (5 of 1 1 analyzed in this study) of the G418-resistant cell lines fail to properly utilize 
the splice acceptor and produce fusion transcripts that hybridize with intron sequences of 
the vector. These insertions presumably do not represent true gene trap events and thus 
were not analyzed further. Northern blot analysis of six properly-spliced lines detected 
a unique-sized Bgal fusion transcript in each cell line. For these experiments, a Northern 
blot of 15 Jig ES cell RNA was hybridised with lacZ gene and reprobed with a RACE 
_cDNA-fragment cloned-from the ST534 (LAR)-insertion^ At least- two independent 
RACE cDNAs were cloned from each cell line. The cDNAs obtained from all cell lines 
except ST514 detected both the fusion transcript and an endogenous transcript common 
to all cell lines as shown for the ST534 probe. The ST514 insertion illustrates that genes 
expressed a very low levels in ES cells can be trapped. In ST514 cultxires, Bgal activity 
was observed only in a few differentiated cells and accordingly neither the fusion nor the 
endogenous transcripts could be detected on Northern blots. 

Sequence analysis of the RACE cDNAs in all cases showed the proper use of the 
splice acceptor and a single open reading frame in-frame with Bgeo. One insertion 
occurred in netrin, a secreted laminin homologous to the unc-6 gene of C elegans (Ishii 
et al. 1992) recently cloned in the chick (Serafini et al. 1994). The remaining five 
insertions interrupted the extracellular domains of membrane spanning proteins: a novel 
cadherin most closely related to the/a/ tumour suppressor gene of Drosoj?hila (Mahoney 
et al. 1991), the sek receptor tyrosine kinase (Gilardi-Hebenstreit et al. 1992), the 
receptor-linked protein tyrosine phosphatase PTPk (Jiang et al. 1993), and two 
independent insertions in a second receptor-linked tyrosine phosphatase LAR (Streuli et 
al. 1988). These re sults support the prediction that Bgal activity is dependent on acquiring _ 
an N-terminal signal sequence from the endogenous gene at the site of insertion. 

The pattern of Bgal expression in embryos derived from insertions in the sek 
(ST497) and netrin- 1 (ST514) genes was very similar to published RNA in situ results 
for the mouse sek (Nieto et al, 1992) and chick netrin (Kennedy et al., 1994) genes, 
providing further proof that gene trap vectors accurately report the pattern of endogenous 
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gene expression (Skames, Auerbach & Joyner, 1992). For these experiments, chimeric 
embiyos and germline mice were generated by injection of C57B1/6 blastocysts (Skames, 
Auerbach & Joyner, 1992). Embryos at the appropriate stages were dissected, fixed and 
stained with X-gai as described (Beddington et al, 1989). Both insertions in LAR 
(ST484, 534) exhibited weak, widespread expression in 8.5d embryos. The insertion in 
PTPk (ST531) showed Bgal expression in endoderm and paraxial mesoderm, highest in 
newly condensing somites. Bgal expression in tissues of adult mice carrying insertions in 
LAR and PTPk correlated well with known sites of mRNA expression (Jiang et aL, 1993 ; 
Longo et a/., 1993). Highest levels of Bgal activity were found in the lung, mammary 
gland and brain of ST534 (LAR) mice and in the kidney, brain and liver of ST531 (PTPk) 
mice. 

ES ceU lines containing insertions in the LAR, PTPk, and sek genes have been 
transmitted to the germline of mice. Following germline transmission of the PTPk and 
LAR insertions, breeding analysis showed that mice homozygous for either insertion are 
viable and fertile. To confirm that the LAR and PTPk genes were effectively disrupted. 
Northern blots of RNA from wild-type and homozygous adult tissues were probed with 
cDNAs from regions downstream of each insertion site. In Northern blots of 10 \ig RNA 
from wild-type (+/+), heterozygous (+/-) and homozygous (-/-) lung of ST534 (LAR) and 
kidney of ST531 (PTPk) adult mice were hybridized with LAR and PTPk cDNA 
sequences 3' to the insertion and reprobed with the ribosomal S12 gene as a loading 
control. For both mutations, normal full-length transcripts were not detected in 
homozygous animals. 

Because secretory trap insertions generate fusions that in some cases will contain 
a large portion of the extracellular domain of the target gene, the production of both loss 
of function and gain of function (i.e., dominant-negative) mutations are possible. 
However, since the Bgeo fusions with LAR and PTPk include less than 300 amino acids 
of the extracellular domains of these proteins, these insertions likely represent null 
mutations. LAR and PTPk are members of an ever-increasing family of receptor PTP 
genes (Saito, 1993). The absence of overt phenotypes in LAR and PTPk mutant mice is 
likely due to functional overlap between gene family members, as has been observed with 
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targeted mutations in multiple members of the myogenic and Src-family genes (Rudnicki 
et al, 1993; Stein, Vogel & Soriano. 1994; Lowell, Soriano & Varmus, 1994). 

Based on the first six genes identified, the secretory trap shows a preference for 
large membrane-spanning receptors. The recovery of two independent insertions in LAR 
further suggests that the current vector design will access a restricted class of genes. The 
requirement for gene trap vectors to insert in introns of genes is predicted to impose an 
inherent bias in favour of detecting genes composed of large intronic regions and 
consequently limit the number of genes accessible with this approach. To access a larger 
pool of genes, we have constructed vectors in each of the three possible reading frames, 
- Furthermore, to recover insertions in smaller transcription units composed of few or no - 
introns, we developed an "exon trap" version of the vector of that lacks a splice acceptor. 
The relative efficiencies of secretory trap vectors engineered in all three reading frames 
and the exon trap vector are given in FIG. 1 d. Electroporations of CGR8 ES cells were 
carried out as described above (Expt 1: 2 x 10*^ cells; Expt 2: lOP cells). Each vector 
yielded similar numbers of G418-resistant colonies, a similar proportion of which exhibit 
the secretory pattern of Bgal activity. With a combination of vectors, one obtains a more 
representative sanpling of the genome that should include both membrane receptors and 
secreted ligands. 

It will be appreciated that in this invention we have shown that the Bgeo reporter 
gene can be modified to contain an N-terminal transmembrane domain. Integration into 
an endogenous gene encoding an N-terminal signal sequence produces a fusion protein 
that assumes a type I configuration, keeping Bgal in the cytosol where it retains functional 
enzyme activity (see Fig. 1 c (i)). Conversely, if the modified reporter integrates into a 
gene that does not encode a signal sequence, the hydrophobic transmembrane domain 
itself is now recognised by the cell as a signal anchor sequence to place the fusion protein 
J2 whe^ Fig^ Lcjii)).., „ 

Therefore, a construct in which Bgal or Bgeo is prefixed by an N-terminal type II 
transmembrane domain has a unique property. If the construct integrates into a 'secretory' 
gene encoding a signal sequence, the Bgal remains active. If it integrates into a non- 
secretory gene, Bgal activity is blocked. This permits integrations into secretory genes to 
be identified by a simple assay (e.g. color change) for reporter gene activity. 
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""the embodiments of the invention in which an exclusive 
property or privilege is claimed are defined as follows: 

1. A method for isolating a target eukaryot±c gene 

encoding an extracelluar protein, said method compzrising 
steps: 

(1) introducing into a plurality of cells a vector 
encoding a type II transmembrane domain and a lumen -sensitive 

-indicator- marker -which_ is. _prefere_ntially_dej:ectable 

present in a secretory lumen of the cells, wherein said . 
indicator marker is oriented 3' relative to said type II 
transmembrane domain, whereby said vector stably integrates 
into the genomes of said plurality of cells to fo3nn a 
plurality of transgenic cells, wherein in at least one cell of 
said plurality of cells, said vector stably integrates into a. 
gene encoding an extracellular protein having an N- terminal 
signal sequence; 

(2) incubating said plurality of cells under conditions 
wherein said indicator marker is expressed in a preferentially 
active form as a fusion protein with an N-terminal region of 
said extracellular protein in said cell or a descendent of 
said cell, and is unexpressed or expressed in a preferentially 
inactive form in said plurality of cells not expressing said 
indicator marker as a fusion protein with an N-terminal region 
of an extracellular protein having an N-terminal signal 
sequence; 

(3) detecting the expression of active indicator marker 
at said cell or a descendent of said cell; 
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(4) isolating from said cell or a descendent said cell 
a nucleic acid encoding least an N-terminal region of said 
extracellular protein. 

2. A method for isolating a target eukaryotic: gene 
encoding an extracellular protein, the method comprS-sing the 
steps: 

(a) introducing into a cell in vitro a vector c^omprising 
a DNA sequence encoding a first fusion protein coniprrising a 
secretory lumen- sensitive indicator marker and a type II 
transmembrane domain positioned N-terminally of the marker, 
wherein upon transfer into the cell, the DNA sequencre stably 
integrates into a gene encoding an extracellular protein 
having an N-terminal signal sequence; 

(b) incubating the cell in vitro under conditions wherein 
the indicator marker is expressed by the cell or descendant of 
the cell in a preferentially active form as a seconc3 fusion 
protein with an N-terminal region of the extracellular 
protein; 

(c) detecting the presence of the preferentially active 
form of the indicator marker, wherein the presence czsf the 
preferentially active form of the indicator marker indicates 
that the gene encodes an extracellular protein; 

(d) isolating the gene encoding the extracellular 
protein. 

3. A method according to claim 1 or claim 2, wherein 
said vector further encodes a selectable marker. 
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4. A method according to any one of claims 1 to 3, 
wherein said cell is an embryonic stem cell. 

5. A method according to any one of claims 1 to 4, 
wherein said preferentially active form is inferred from a 
detectable amount of a catalytic activity and said 
preferentially inactive form is inferred from an incdetectable 
amount of said catalytic activity. 

6. A method for making a transgenic cell comE^rising a 
mutation in a gene encoding an extracellular protein, said 
method comprising steps: 

(1) introducing into a plurality of cells a vector 
encoding a type II transmembrane domain and a lumen — sensitive 
indicator marker (same as claim 1) , wherein said indicator 
marker is oriented 3' relative to said type II transmembrane 
domain, whereby said vector stably integrates into the genomes 
of said plurality of cells to form a plurality of taransgenic 
cells, wherein in at least one cell of said plurality of 
cells, said vector stably integrates into a gene encoding an 
extracellular protein having an N-terminal signal sequence; 

(2) incubating said plurality of cells or descendents of 
said plurality of cells under conditions wherein sadLd 
indicator marker is expressed in a preferentially active form 
as a fusion protein with an N-terminal region of said 
extracellular protein in said cell or a descendent of said 
cell, and is unexpressed or expressed in a preferentially 
inactive form in said plurality of cells not expressing said 
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indicator marker as a fusion protein with an N-terminal region 
of an extracellular protein having an N-terminal signal 
sequence ; 

(3) detecting the expression of active indicator marker 
at said cell or a descendent of said cell; wherein said cell 
is a transgenic cell comprising a mutation in a gerxG encoding 
an extracellular protein. 

7. A method for making a transgenic cell coniE)rising a 

mutation in a gene encoding an extracellular protein, said 
method comprising steps: 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding a first fusion protein comporising a 
secretory lumen-sensitive indicator marker and a tyjpe II 
transmembrane domain positioned N-terminally of the marker, 
whereby upon transfer into the cell, the DNA sequence stably 
integrates into a gene encoding an extracellular protein 
having an N- terminal signal sequence; 

(b) incubating the cell in vitro under conditions wherein 
said indicator marker is expressed by the cell or descendent 
of the cell in a preferentially active form as a second fusion 
protein with an N-terminal region of the extracellular protein 

(c) detecting the expression of the preferentially active 
form of the indicator marker, wherein the expressioan of the 
preferentially active form of the indicator marker indicates 
the presence of the second fusion protein, and the presence of 
the second fusion protein indicates that the cell is a 
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transgenic cell comprising a mutation in a gene encoding an 
extracellular protein. 

8. A method according to claim 6 or claim 1, wherein 
said vector further encodes a selectable marker. 

9. A method according to any one of claims 6 to 8, 
wherein said cells are pluripotent cells. 

10. A method according to any one of claims S to 9, 
wherein said preferentially active form is inferred, from a 
detectable amount of a catalytic activity and said 
preferentially inactive form is inferred from an indetectable 
amount of said catalytic activity. 

11. A vector comprising a DNA sequence encoding a first 
fusion protein comprising a secretory lumen- sensitive 
indicator marker and a type II transmembrane domain positioned 
N-terminally of the marker, wherein upon transfer into a cell 
and stable integration of the DNA sequence into a gene 
encoding an extracellular protein having an N-terminal signal 
sequence, the marker is expressed in an active fornci as a 
second fusion protein with an N-terminal region of the 
extracellular protein. 

12. A vector according to claim 11, wherein the vector 
further encodes a selectable marker. 
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13. An animal cell comprising a vector according to 
claim 11, wherein the DNA sequence is stably integrated into a 
gene of the cell encoding an extracellular protein tiaving an 
N- terminal signal sequence, and the marker is expressed in an 
active form as a second fusion protein with an N-terroinal 
region of the extracellular protein. 

14. A vector comprising a DNA sequence encoding a 
reporter gene and a type II transmembrane domain positioned N- 
terminally to the reporter gene, wherein upon transfer into a 
cell and stable integration of the DNA sequence into a target 
gene, expression of the reporter gene is detectable if the 
target gene encodes a secreted or membrane -spanning protein 
having a signal sequence and is undetectable if the target 
gene codes for a non- secreted, non- membrane -spanning protein 
not having a signal sequence. 

15. The vector according to claim 14, wherein the vector 
further comprises a gene encoding a selectable marlcer. 

16. The vector according to claim 15, wherein the 
selectable marker is for antibiotic resistance. 

17. The vector according to claim 16, wherein the 
antibiotic resistance is to G418. 

18. The vector according to claim 16, wherein the 
selectable marker is dependence on a nutrient. 
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19. The vector according to claim 15, wherein the 
selectable marker is dependence on a growth factor. 

20. The vector according to claim 14, wherein the 
reporter gene encodes an enzymatic activity. 

21. The vector according to claim 14, wherein the 
reporter gene encodes )8-galactosidase . 

22. A method of detecting a target eukaryotic gene, the 
method comprising the steps: 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding a reporter gene and a type 
transmembrane domain positioned N- terminally to the reporter 
gene, wherein upon transfer into the cell, the DNA sequence 
stably integrates into a target gene; 

(b) incubating the cell in vitro under conditions wherein 
the reporter gene is expressed by the cells or descendant of 
the cell ; 

(c) determining that the target gene encodes a secreted 
or membr-ane- spanning protein having a signal sequencre by the 
detection of reporter gene expression or determining that the 
target gene encodes a non- secreted, no-membrane spanning 
protein not having a signal sequence by the undetectable 
expression of the reporter gene; and 

(d) identifying the target gene. 
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2^- A method of detecting a target eukaryotic gene, the 

method comprising the steps: 

(a) introducing into a cell in vitro a vector comprising 
a DNA sequence encoding jS-galactosidase and a type II 
transmembrane domain positioned N-terminally to the jS- 
galactosidase coding region of the DNA sequence, whierein upon 
transfer into the cell, the DNA sequence stably integrates 
into a target gene; 

(b) incubating the cell in vitro under conditions wherein 
the ^-galactosidase is expressed by the cells or descendent of 
the cell; 

(c) determining that the target gene encodes a secreted 
or membrane-spanning protein having a signal sequence by the 
detection of jS-galactosidase expression or determining that 
the target gene encodes a non- secreted, non-membrane spanning 
protein not having a signal sequence by the undetectable /3- 
galactoisidase expression; and 

(d) identifying the target gene. 

2^- An animal cell comprising a vector according to 

claim 14, wherein the DNA sequence is stably integrated into a 
gene of the cell, and wherein expression of the reporter gene 
permits identifying genes which encode a secreted oir membrane- 
spanning protein having a signal sequence and genes which 
encode a non- secreted, non-membrane spanning protein, not 
having a signal sequence. 
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25. An animal cell comprising a vector according to 

claim 21, wherein the DNA sequence Is stably integrated into a 
gene of the cell, and expression of jS-galactosidase permits 
identifying genes which encode a secreted or membrane- spanning 
protein having a signal sequence and genes which encode a non- 
secreted, non-membrane spanning protein not having a signal 
sequence. 



26- An isolated pluripotent cell comprising a vector 

according to claim 14, wherein the DNA sequence Is stably 
integrated into a gene of the cell, and wherein expression of 
the reporter gene permits identifying genes which encode a 
secreted or membrane- spanning protein having a signal sequence 
and genes which encode a non-secreted, non-membrane spanning 
protein not having a signal sequence. 

27. An isolated pluripotent cell comprising a vector 

according to claim 21, wherein the DNA sequence is stably 
integrated into a gene of the cell, and expression of 0- 
galactosidase permits identifying genes which encode a 
secreted or membrane- spanning protein having a signal sequence 
and genes which encode a non-secreted, non-membrane spanning 
-protein— not having~a" sighar"sequence^^ " 
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ABSIEACI 

The invention relates to secretory gene trap vectors and methods of using such vectors 
to isolate extracellular proteins and to make cells and organisms with mutant secretory 
genes. The vectors encode a type II transmembrane domain and a lumen -sensitive 
indicator marker and optionally, a selectable marker and an exon-splice acceptor site. The 
gene isolation methods involve stably introducing the secretory trap vectors into an 
endogenous gene whereby the expression of the resultant fusion protein provides a 
differential expression of the indicator marker depending on whether the endogenous gene 
provides an N-terminal signal sequence. 
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